142 research outputs found

    Artimate: an articulatory animation framework for audiovisual speech synthesis

    Get PDF
    We present a modular framework for articulatory animation synthesis using speech motion capture data obtained with electromagnetic articulography (EMA). Adapting a skeletal animation approach, the articulatory motion data is applied to a three-dimensional (3D) model of the vocal tract, creating a portable resource that can be integrated in an audiovisual (AV) speech synthesis platform to provide realistic animation of the tongue and teeth for a virtual character. The framework also provides an interface to articulatory animation synthesis, as well as an example application to illustrate its use with a 3D game engine. We rely on cross-platform, open-source software and open standards to provide a lightweight, accessible, and portable workflow.Comment: Workshop on Innovation and Applications in Speech Technology (2012

    Introduction de contraintes pour l'inversion acoustico-articulatoire utilisant une table hypercubique

    Get PDF
    Colloque avec actes et comité de lecture. nationale.National audienceOur acoustic to articulatory inversion method exploits an original codebook representing the articulatory space by hypercubes. The articulatory space is decomposed into regions where the articulatory-to-acoustic mapping is linear. Each region is represented by a hypercube. The inversion procedure retrieves articulatory vectors corresponding to an acoustic entry from the hypercube codebook. As the dimension of the articulatory space is greater than the dimension of the acoustic space, the corresponding null space is sampled by linear programming to retrieve all the possible solutions. A dynamic procedure is used to recover the best articulatory trajectory according to a minimum articulatory rate criterion. The addition of constraints allows the inversion process to be focused on realistic inverse articulatory trajectories

    Mixing faces and voices: a study of the influence of faces and voices on audiovisual intelligibility

    Get PDF
    International audienceThis study examined the influence of mixing faces and voices on the audiovisual intelligibility. The goal is to study the effect of combining two sources of information on the audiovisual intelligibility. Cross-talker dubbing was performed between faces and voices of 10 meaningful sentences pronounced by 10 talkers: 5 females and 5 males. Human subjects were asked to rate the articulation of the output videos. Comparisons were made between results of original and dubbed video. Almost across all the combinations, the audiovisual intelligibility was acceptable. The intelligibility of the speakers varied, however. We observed an influence of the audio/visual channel on the overall intelligibility that can increase or decrease depending the intelligibility results of this channel

    Towards an articulatory tongue model using 3D EMA

    Get PDF
    International audienceWithin the framework of an acoustic-visual (AV) speech synthesizer, we describe a preliminary tongue model that is both simple and flexible, and which is controlled by 3D electromagnetic articulography (EMA) data through an animation interface, providing realistic tongue movements for improved visual intelligibility. Data from a pilot study is discussed and deemed encouraging, and the integration of the tongue model into the AV synthesizer is outlined

    Utilisation d'un dictionnaire hypercubique pour l'inversion acoustico-articulatoire

    Get PDF
    Colloque avec actes et comité de lecture. nationale.National audienceDans cet article, nous présentons une méthode de construction d'un dictionnaire articulatoire qui donne une bonne couverture de l'espace articulatoire avec un nombre limité de points. Il s'agit d'une nouvelle représentation de l'espace articulatoire par des hypercubes. Pour chaque sommet d'un hypercube on connaît les paramètres articulatoires et les paramètres acoustiques (les formants). Nous présentons une méthode d'interpolation pour calculer avec une grande précision la trajectoire acoustique qui correspond à une trajectoire articulatoire, et la méthode d'inversion qui permet de récupérer les paramètres articulatoires à partir des formants. Le point fort de la méthode d'inversion utilisant ce dictionnaire est sa robustesse vis à vis les problèmes de non-linéarité de la relation articulatoire-acoustique

    Predicting Tongue Positions from Acoustics and Facial Features

    Get PDF
    International audienceWe test the hypothesis that adding information regarding the positions of electromagnetic articulograph (EMA) sensors on the lips and jaw can improve the results of a typical acoustic-to-EMA mapping system, based on support vector regression, that targets the tongue sensors. Our initial motivation is to use such a system in the context of adding a tongue animation to a talking head built on the basis of concatenating bimodal acoustic-visual units. For completeness, we also train a system that maps only jaw and lip information to tongue information

    Image processing device

    Get PDF

    An episodic memory-based solution for the acoustic-to-articulatory inversion problem

    Get PDF
    International audienceThis paper presents an acoustic-to-articulatory inversion method based on an episodic memory. An episodic memory is an interesting model for two reasons. First, it does not rely on any assumptions about the mapping function but rather it relies on real synchronized acoustic and articulatory data streams. Second, the memory inherently represents the real articulatory dynamics as observed. It is argued that the computational models of episodic memory, as they are usually designed, cannot provide a satisfying solution for the acoustic-to-articulatory inversion problem due to the insufficient quantity of training data. Therefore, an episodic memory is proposed, called generative episodic memory (G-Mem), which is able to produce articulatory trajectories that do not belong to the set of episodes the memory is based on. The generative episodic memory is evaluated using two electromagnetic articulography corpora: one for English and one for French. Comparisons with a codebook-based method and with a classical episodic memory (which is termed concatenative episodic memory) are presented in order to evaluate the proposed generative episodic memory in terms of both its modeling of articulatory dynamics and its generalization capabilities. The results show the effectiveness of the method where an overall root-mean-square error of 1.65 mm and a correlation of 0.71 are obtained for the G-Mem method. They are comparable to those of methods recently proposed

    Continuous episodic memory based speech recognition using articulatory dynamics

    Get PDF
    International audienceIn this paper we present a speech recognition system based on articulatory dynamics. We do not extend the acoustic feature with any explicit articulatory measurements but instead the ar- ticulatory dynamics of speech are structurally embodied within episodic memories. The proposed recognizer is made of differ- ent memories each specialized for a particular articulator. As all the articulators do not contribute equally to the realization of a particular phoneme, the specialized memories do not per- form equally regarding each phoneme. We show, through phone string recognition experiments that combining the recognition hypotheses resulting from the different articulatory specialized memories leads to significant recognition improvements

    A study of the French Vowels Through The Main Constriction of the Vocal Tract Using an Acoustic-to-articulatory inversion method

    Get PDF
    Colloque avec actes et comité de lecture. internationale.International audienceThis paper presents a study of the articulatory properties of French vowels using an acoustic-to-articulatory inversion method. The advantage of such an approach is that all the possible articulatory configurations can be studied independently of any articulatory preferences linked with a given speaker. Furthermore, it bypasses the issue of acquiring a vast amount of articulatory data by medical imaging techniques. The inversion method exploits an articulatory codebook, the acoustic precision of which is constant whatever the articulatory reion considered. Since the inversion is performed from the first three formants of vowels to recover the seven parameters of Maeda's model the null space of the articulatory to acoustic mapping is explored to recover all the possible articulatory shapes. Applied to French vowels this method allows the different places of articulation to be determined
    • …
    corecore